Using Wikipedia to Validate the Terminology found in a Corpus of Basic Textbooks

نویسندگان

  • Jorge Vivaldi
  • Luis Adrián Cabrera-Diego
  • Gerardo Sierra
  • María Pozzi
چکیده

A scientific vocabulary is a set of terms that designate scientific concepts. This set of lexical units can be used in several applications ranging from the development of terminological dictionaries and machine translation systems to the development of lexical databases and beyond. Even though automatic term recognition systems exist since the 80s, this process is still mainly done by hand, since it generally yields more accurate results, although not in less time and at a higher cost. Some of the reasons for this are the fairly low precision and recall results obtained, the domain dependence of existing tools and the lack of available semantic knowledge needed to validate these results. In this paper we present a method that uses Wikipedia as a semantic knowledge resource, to validate term candidates from a set of scientific text books used in the last three years of high school for mathematics, health education and ecology. The proposed method may be applied to any domain or language (assuming there is a minimal coverage by Wikipedia).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs

Many  elements  contribute  to  the  relative  difficulty  in  acquiring  specific  aspects  of  English  as  a foreign  language  (Goldschneider  &  DeKeyser,  2001).  Modal  auxiliary  verbs  (e.g.  could,  might), are  examples  of  a  structure  that  is  difficult  for  many  learners.  Not  only  are  they  particularly complex  semantically,  but  especially  in  the  Malaysian  context ...

متن کامل

Visual Representation of Social Actors in ELT Nursery Rhymes

With the advent of globalization, especially in its third phase (see Robertson, 2003), global relations of domination have undermined abuse of power at national and local levels (Fairclough, 2001). Global ELT textbooks, as corollaries of the globalization process, are not immune to the embedment of discriminatory discourses, as various studies have shown (see for example, Gray, 2010, 2012; Baba...

متن کامل

A Conversation Analysis of Ellipsis and Substitution in Global Business English Textbooks

Despite the body of research on textbook evaluation from the discourse analysis perspective, cohesive devices have rarely been analyzed in English for Specific Purposes (ESP) textbooks. The acquisition and use of cohesive devices is inherent to naturalistic communication, including business interactions. Hence, L2 learners of business English should be exposed to these devices through cohesion-...

متن کامل

Traduction automatique statistique à partir de corpus comparables : application aux couples de langues arabe-français

The present research aims to exploit comparable corpora for Statistical Machine Translation (SMT). First, a hybrid approach based on statistical and linguistics-based information is proposed for bilingual terminology extraction from Wikipedia documents. Then, we propose a hybrid approach based on length and dictionary model for the alignment of the United Nations (UN) corpus at the sentence lev...

متن کامل

Review of skin cancers terminology, etiology and treatment from ancient Persian medicine view point

Background: Skin cancers are the most prevalent type among the white with an increasing trend of incidence around the world and Iran. Scientific developments in diagnosing these cancers and using screening methods and utilizing treatment methods have contributed to the relative control of the cancer. Hence, it is necessary to consider other suggested approaches of complementary and traditional ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012